22 research outputs found

    PREDICTING INTRADAY STOCK RETURNS BY INTEGRATING MARKET DATA AND FINANCIAL NEWS REPORTS

    Get PDF
    Forecasting in the financial domain is undoubtedly a challenging undertaking in data mining. While the majority of previous studies in this field utilize historical market data to predict future stock returns, we explore whether there is benefit in augmenting the prediction model with supplementary domain knowledge obtained from financial news reports. To this end, we empirically evaluate how the integration of these data sources helps to predict intraday stocks returns. We consider several types of integration methods: variable-based as well as bundling methods. To discern whether the integration methods are sensitive to the type of forecasting algorithm, we have implemented each integration method using three different data mining algorithms. The results show several scenarios in which appending market-based data with textual news-based data helps to improve forecasting performance. The successful integration strongly depends on which forecasting algorithm and variable representation method is utilized. The findings are promising enough to warrant further studies in this direction

    Who’s A Good Decision Maker? Data-Driven Expert Worker Ranking under Unobservable Quality

    Get PDF
    Evaluation of expert workers by their decision quality has substantial practical value, yet using other expert workers for decision quality evaluation tasks is costly and often infeasible. In this work, we frame the Ranking of Expert workers according to their unobserved decision Quality (REQ) -- without resorting to evaluation by other experts -- as a new Data Science problem. This problem is challenging, as the correct decisions are commonly unobservable and substantial parts of the information available to the decision maker is not available for retrospective decision evaluation. We propose a new machine learning approach to address this problem. We evaluate our method on one dataset representing real expert decisions and two public datasets, and find that our approach is successful in generating highly accurate rankings. Moreover, we observe that our approach’s superiority over the baseline is particularly prominent as evaluation settings become increasingly challenging

    Prediction in Economic Networks: Using the Implicit Gestalt in Product Graphs

    Get PDF
    We define an economic network as a linked set of products, where links are created by realizations of shared outcomes between entities. We analyze the predictive information contained in an increasingly prevalent type of economic network, a “product network” that links the landing pages of goods frequently co-purchased on e-commerce websites. Our data include one million books in 400 categories spanning two years, with over 70 million observations. Using autoregressive and neural-network models, we demonstrate that combining historical demand of a product with that of its neighbors improves demand predictions even as the network changes over time. Furthermore, network properties such as clustering and centrality contribute significantly to predictive accuracy. To our knowledge, this is the first large-scale study showing that a non-static product network contains useful distributed information for demand prediction, and that this information is more effectively exploited by integrating composite structural network properties into one’s predictive models

    Do Customers Speak Their Minds? Using Forums and Search for Predicting Sales

    Get PDF
    A wide body of research uses data from social media websites to predict offline economic outcomes such as sales. However, in practice, such data are costly to collect and process. Additionally, sales forecasts based on social media data may be hampered by people’s tendency to restrict the topics they publicly discuss. Recently, a new source of predictive information—search engine logs—has become available. Interestingly, the relationship between these two important data sources has not been studied. Specifically, do they contain complementary information? Or does the information conveyed by one source render the information conveyed by the other source redundant? This study uses Google’s comprehensive index of internet discussion forums, in addition to Google search trend data. Predictive models based on search trend data are shown to outperform and complement forum-data-based models. Furthermore, the two sources display substantially different patterns of predictive capacity over time

    Protein Dynamics in Individual Human Cells: Experiment and Theory

    Get PDF
    A current challenge in biology is to understand the dynamics of protein circuits in living human cells. Can one define and test equations for the dynamics and variability of a protein over time? Here, we address this experimentally and theoretically, by means of accurate time-resolved measurements of endogenously tagged proteins in individual human cells. As a model system, we choose three stable proteins displaying cell-cycle–dependant dynamics. We find that protein accumulation with time per cell is quadratic for proteins with long mRNA life times and approximately linear for a protein with short mRNA lifetime. Both behaviors correspond to a classical model of transcription and translation. A stochastic model, in which genes slowly switch between ON and OFF states, captures measured cell–cell variability. The data suggests, in accordance with the model, that switching to the gene ON state is exponentially distributed and that the cell–cell distribution of protein levels can be approximated by a Gamma distribution throughout the cell cycle. These results suggest that relatively simple models may describe protein dynamics in individual human cells

    Using Crowd-Based Data Selection to Improve the Predictive Power of Search Trend Data

    No full text
    Large-scale data generated by crowds provide a myriad of opportunities for monitoring and modeling people\u27s intentions, preferences, and opinions. A crucial step in analyzing such Big Data is identifying the relevant data items that should be provided as input to the modeling process. Interestingly, this important step has received limited attention in previous research. This paper proposes a novel crowd-based approach to this data selection problem: leveraging crowds to amplify the predictive capacity of search trend data (Google Trends). We developed an online word association task that taps into people\u27s thought-collection process when thinking about a focal term. We empirically tested this method in two domains that have been used as test-beds for prediction. The method yields predictions that are equivalent or superior to those obtained in previous studies (using alternative data selection methods) and to predictions obtained using various benchmark data selection methods

    The Predictive Power of Engagement in Mobile Consumption

    No full text
    One of the prominent segments of mobile commerce is the mobile application market, where consumers download applications from an app store. Importantly, prior work showed that user behavior in mobile settings is substantially different than user behavior in PC settings, and therefore needs to be better understood. In this research, we study for the first time the predictive power of consumer engagement in such mobile settings. Using data from a leading commercial A/B testing platform specializing in app store design, we perform both in-sample assessment and predictive capacity evaluation of prediction models of app store conversion based on engagement information. Our findings show that in mobile settings, engagement-based models are highly informative for predicting conversion, and are consistent across different prediction methods (logistic regression, classification tree, and random forest). These findings indicate that engagement analytics may enhance our understanding of app conversion process
    corecore